This document contains code to perform clustering on the HumanPilot brain dataset using the top few UMAP coordinates together with the two spatial coordinates.
Our reasoning is that performing clustering on a combined set of molecular feature dimensions and spatial dimensions (e.g. concatenating the two spatial dimensions as additional columns with the top 50 PCs) is a simple and intuitive way to combine the molecular and spatial data. However, using 2 spatial dimensions together with 50 PCs (or even all 1942 highly variable genes) risks swamping the spatial information, so the clustering is dominated by the molecular dimensions.
Using the top few UMAP coordinates (instead of top 50 PCs or 1942 HVGs) is a simple way to reduce as much as possible of the molecular information into a small number of dimensions. Then these top 5-10 dimensions can be combined with the 2 spatial dimensions on a more equal basis.
Also important is to ensure that the UMAP coordinates and spatial dimensions are on roughly similar scales. For example, if the top UMAP coordinate ranges from say -5 to +5, then the spatial dimensions should also be scaled to approximately this range. (In addition, UMAP dimensions and spatial dimensions should not be z-scaled – z-scaling UMAP coordinates would scale up the less meaningful dimensions; and z-scaling spatial dimensions does not make sense since these exist on a uniform scale with physical meaning.)
The following code runs clustering using UMAP coordinates plus spatial dimensions, with a few different attempts at parameter tuning. The main parameters (and other design choices) are:
suppressPackageStartupMessages(library(SingleCellExperiment))
suppressPackageStartupMessages(library(uwot))
suppressPackageStartupMessages(library(scran))
suppressPackageStartupMessages(library(scater))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(RColorBrewer))
Load pre-processed SingleCellExperiment data object.
# load scran output file (containing top 50 molecular PCs and 2 spatial coordinates)
load("../../data/Human_DLPFC_Visium_processedData_sce_scran.Rdata")
sce
## class: SingleCellExperiment
## dim: 33538 47681
## metadata(1): image
## assays(2): counts logcounts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(5): gene_id gene_version gene_name gene_source
## gene_biotype
## colnames(47681): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
## colData names(19): barcode sample_name ... key cell_count
## reducedDimNames(6): PCA TSNE_perplexity50 ... TSNE_perplexity80
## UMAP_neighbors15
## spikeNames(0):
## altExpNames(0):
# select cells from one sample
ix <- colData(sce)$sample_name == 151673
sce <- sce[, ix]
sce
## class: SingleCellExperiment
## dim: 33538 3639
## metadata(1): image
## assays(2): counts logcounts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(5): gene_id gene_version gene_name gene_source
## gene_biotype
## colnames(3639): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ...
## TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(19): barcode sample_name ... key cell_count
## reducedDimNames(6): PCA TSNE_perplexity50 ... TSNE_perplexity80
## UMAP_neighbors15
## spikeNames(0):
## altExpNames(0):
Extract principal components (PCs) and spatial coordinates from data object.
# extract PCs
dims_pcs <- reducedDim(sce, type = "PCA")
stopifnot(nrow(dims_pcs) == ncol(sce))
# extract spatial dimensions
dims_spatial <- colData(sce)[, c("imagecol", "imagerow")]
stopifnot(nrow(dims_spatial) == ncol(sce))
Calculate UMAP dimensions/coordinates/components. Will aim to use the top few (e.g. 5-10) UMAP components for clustering.
Question: how much of the overall heterogeneity do these top few components capture? Maybe need some additional analyses to investigate this.
Note: calculate UMAP on top 50 PCs for faster runtime (could also calculate on all 1942 highly variable genes instead for more accuracy)
Note: calculating UMAP on one sample only (could also calculate UMAP on all samples combined, then subset for clustering)
Note: could also use top few PCs for clustering (instead of top few UMAP components). Maybe need some additional analyses to show the benefit of using UMAP vs. PCs.
# keep top 50 UMAP components
set.seed(123)
dims_umap <- umap(dims_pcs, n_components = 50)
stopifnot(nrow(dims_umap) == ncol(sce))
Need all dimensions (UMAP and spatial) to be on approximately comparable scales.
UMAP dimensions are already on a sensible scale, so can leave as is (note: don’t do z-score scaling since this will scale up the less meaningful UMAP compenents)
summary(dims_umap)
## V1 V2 V3 V4
## Min. :-5.9683 Min. :-0.4662351 Min. :-2.43302 Min. :-0.57427
## 1st Qu.:-0.1451 1st Qu.:-0.2600787 1st Qu.:-1.31960 1st Qu.:-0.23484
## Median : 0.9647 Median :-0.0001477 Median :-0.71714 Median : 0.02704
## Mean : 0.0000 Mean : 0.0000000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 1.6725 3rd Qu.: 0.2405742 3rd Qu.: 0.01786 3rd Qu.: 0.15711
## Max. : 2.5713 Max. : 0.5380302 Max. : 5.11824 Max. : 0.72873
## V5 V6 V7 V8
## Min. :-0.54490 Min. :-0.25690 Min. :-1.4100 Min. :-1.07071
## 1st Qu.:-0.25571 1st Qu.:-0.09851 1st Qu.:-0.6270 1st Qu.:-0.18054
## Median :-0.09029 Median :-0.01514 Median :-0.0927 Median : 0.06571
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.07080 3rd Qu.: 0.07189 3rd Qu.: 0.3290 3rd Qu.: 0.36262
## Max. : 1.04638 Max. : 0.31764 Max. : 1.8935 Max. : 0.80204
## V9 V10 V11 V12
## Min. :-0.176303 Min. :-0.3606 Min. :-0.19148 Min. :-0.47781
## 1st Qu.:-0.058986 1st Qu.:-0.1102 1st Qu.:-0.10588 1st Qu.:-0.15836
## Median :-0.001474 Median : 0.0409 Median :-0.04443 Median :-0.03414
## Mean : 0.000000 Mean : 0.0000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.051638 3rd Qu.: 0.1160 3rd Qu.: 0.03509 3rd Qu.: 0.18429
## Max. : 0.231652 Max. : 0.2684 Max. : 0.37590 Max. : 0.51893
## V13 V14 V15 V16
## Min. :-0.509388 Min. :-0.147833 Min. :-0.279465 Min. :-0.39823
## 1st Qu.:-0.190398 1st Qu.:-0.029815 1st Qu.:-0.121698 1st Qu.:-0.13984
## Median : 0.001397 Median : 0.006838 Median :-0.002903 Median : 0.02766
## Mean : 0.000000 Mean : 0.000000 Mean : 0.000000 Mean : 0.00000
## 3rd Qu.: 0.218475 3rd Qu.: 0.044795 3rd Qu.: 0.100184 3rd Qu.: 0.13001
## Max. : 0.492934 Max. : 0.094353 Max. : 0.368139 Max. : 0.33441
## V17 V18 V19 V20
## Min. :-1.185669 Min. :-0.21786 Min. :-0.18488 Min. :-0.14275
## 1st Qu.:-0.409938 1st Qu.:-0.11130 1st Qu.:-0.04715 1st Qu.:-0.07003
## Median :-0.003977 Median :-0.02158 Median :-0.01930 Median :-0.01510
## Mean : 0.000000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.489540 3rd Qu.: 0.04937 3rd Qu.: 0.01679 3rd Qu.: 0.06739
## Max. : 1.156438 Max. : 0.35617 Max. : 0.32452 Max. : 0.16377
## V21 V22 V23 V24
## Min. :-0.397652 Min. :-0.27171 Min. :-0.22696 Min. :-0.72424
## 1st Qu.:-0.115875 1st Qu.:-0.10680 1st Qu.:-0.12143 1st Qu.:-0.06031
## Median : 0.002783 Median :-0.01195 Median :-0.02672 Median : 0.05360
## Mean : 0.000000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.141298 3rd Qu.: 0.12815 3rd Qu.: 0.13467 3rd Qu.: 0.22943
## Max. : 0.309702 Max. : 0.29570 Max. : 0.23972 Max. : 0.47937
## V25 V26 V27 V28
## Min. :-0.34244 Min. :-0.294733 Min. :-0.37593 Min. :-0.27736
## 1st Qu.:-0.19769 1st Qu.:-0.161978 1st Qu.:-0.15699 1st Qu.:-0.08581
## Median :-0.07857 Median : 0.005442 Median :-0.06162 Median : 0.03237
## Mean : 0.00000 Mean : 0.000000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.10579 3rd Qu.: 0.161660 3rd Qu.: 0.17684 3rd Qu.: 0.08975
## Max. : 0.56479 Max. : 0.311006 Max. : 0.45290 Max. : 0.19894
## V29 V30 V31 V32
## Min. :-0.30721 Min. :-0.27314 Min. :-0.33414 Min. :-0.26486
## 1st Qu.:-0.04586 1st Qu.:-0.11898 1st Qu.:-0.15658 1st Qu.:-0.05436
## Median : 0.01108 Median : 0.02261 Median :-0.03855 Median : 0.02366
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.09476 3rd Qu.: 0.11200 3rd Qu.: 0.16583 3rd Qu.: 0.07470
## Max. : 0.22020 Max. : 0.25037 Max. : 0.41914 Max. : 0.16321
## V33 V34 V35 V36
## Min. :-0.35551 Min. :-0.30824 Min. :-0.249745 Min. :-0.32431
## 1st Qu.:-0.15042 1st Qu.:-0.09021 1st Qu.:-0.090202 1st Qu.:-0.05389
## Median :-0.04160 Median : 0.01097 Median :-0.008285 Median : 0.04394
## Mean : 0.00000 Mean : 0.00000 Mean : 0.000000 Mean : 0.00000
## 3rd Qu.: 0.05838 3rd Qu.: 0.11102 3rd Qu.: 0.073775 3rd Qu.: 0.10007
## Max. : 0.48544 Max. : 0.30381 Max. : 0.373197 Max. : 0.19262
## V37 V38 V39 V40
## Min. :-0.45406 Min. :-0.199878 Min. :-0.32056 Min. :-0.8506
## 1st Qu.:-0.10652 1st Qu.:-0.066289 1st Qu.:-0.01951 1st Qu.:-0.3265
## Median : 0.01659 Median :-0.002477 Median : 0.04383 Median : 0.0450
## Mean : 0.00000 Mean : 0.000000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.16504 3rd Qu.: 0.064654 3rd Qu.: 0.08811 3rd Qu.: 0.3353
## Max. : 0.36911 Max. : 0.215660 Max. : 0.16772 Max. : 0.6746
## V41 V42 V43 V44
## Min. :-0.37070 Min. :-0.70018 Min. :-0.2737014 Min. :-0.31280
## 1st Qu.:-0.12290 1st Qu.:-0.24762 1st Qu.:-0.1242063 1st Qu.:-0.10055
## Median :-0.01451 Median :-0.02004 Median : 0.0004435 Median : 0.01488
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000000 Mean : 0.00000
## 3rd Qu.: 0.13139 3rd Qu.: 0.27063 3rd Qu.: 0.1111930 3rd Qu.: 0.10172
## Max. : 0.40938 Max. : 0.70055 Max. : 0.2917181 Max. : 0.26751
## V45 V46 V47 V48
## Min. :-0.48443 Min. :-0.32025 Min. :-0.42392 Min. :-0.257488
## 1st Qu.:-0.04187 1st Qu.:-0.10484 1st Qu.:-0.13312 1st Qu.:-0.119057
## Median : 0.06096 Median : 0.02491 Median :-0.02089 Median : 0.006523
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.000000
## 3rd Qu.: 0.13876 3rd Qu.: 0.11175 3rd Qu.: 0.15299 3rd Qu.: 0.122430
## Max. : 0.25433 Max. : 0.22103 Max. : 0.41335 Max. : 0.249050
## V49 V50
## Min. :-0.71702 Min. :-0.1974963
## 1st Qu.:-0.06714 1st Qu.:-0.0538026
## Median : 0.08924 Median : 0.0008243
## Mean : 0.00000 Mean : 0.0000000
## 3rd Qu.: 0.22013 3rd Qu.: 0.0531311
## Max. : 0.42241 Max. : 0.1842943
mean(dims_umap[, 1])
## [1] 3.743539e-17
sd(dims_umap[, 1])
## [1] 2.433886
max(abs(dims_umap))
## [1] 5.968269
range(dims_umap[, 1])
## [1] -5.968269 2.571303
range(dims_umap[, 2])
## [1] -0.4662351 0.5380302
range(dims_umap[, 3])
## [1] -2.433016 5.118238
colnames(dims_umap) <- paste0("UMAP_", seq_len(ncol(dims_umap)))
Spatial dimensions: scale to e.g. min -5 and max 5, so they are on roughly similar scale as top few UMAP dimensions (note: z-score scaling doesn’t really make sense for spatial coordinates, which are on a uniform physical scale)
Note: choice of these max and min values is very important! results will be highly sensitive to this
summary(as.data.frame(dims_spatial))
## imagecol imagerow
## Min. :139.3 Min. :109.7
## 1st Qu.:234.3 1st Qu.:212.2
## Median :314.9 Median :305.2
## Mean :315.4 Mean :302.4
## 3rd Qu.:396.8 3rd Qu.:392.2
## Max. :497.8 Max. :521.3
range(dims_spatial[, 1])
## [1] 139.3339 497.8398
range(dims_spatial[, 2])
## [1] 109.6760 521.3322
dims_spatial <- apply(as.matrix(dims_spatial), 2, function(col) {
(col - min(col)) / (max(col) - min(col)) * 10 - 5
})
colnames(dims_spatial) <- c("spatial_x", "spatial_y")
summary(dims_spatial)
## spatial_x spatial_y
## Min. :-5.00000 Min. :-5.0000
## 1st Qu.:-2.34999 1st Qu.:-2.5090
## Median :-0.10168 Median :-0.2509
## Mean :-0.08905 Mean :-0.3179
## 3rd Qu.: 2.18052 3rd Qu.: 1.8640
## Max. : 5.00000 Max. : 5.0000
stopifnot(nrow(dims_spatial) == ncol(sce))
Now can run standard Bioconductor graph-based clustering on subset of UMAP dimensions and scaled spatial dimensions.
Note: graph-based clustering seems better suited than k-means for this dataset, since the “layers” in brain data do not have an ellipsoidal shape in the spatial feature space. However, for other datasets, e.g. cancer data, k-means may also work.
# number of UMAP dimensions to use
n_umap <- 10
dims_clus <- cbind(dims_umap[, seq_len(n_umap), drop = FALSE], dims_spatial)
head(dims_clus)
## UMAP_1 UMAP_2 UMAP_3 UMAP_4 UMAP_5
## AAACAAGTATCTCCCA-1 1.6574038 0.211001493 -1.6190826 -0.04793527 -0.23406501
## AAACAATCTACTAGCA-1 0.2069426 -0.002405581 -0.0430752 0.08569449 -0.13459698
## AAACACCAATAACTGC-1 -5.8309770 -0.320130656 5.0541569 0.10144563 0.87353051
## AAACAGAGCGACTCCT-1 1.7260043 0.288170335 -1.1166769 0.22265336 -0.25734260
## AAACAGCTTTCAGAAG-1 -0.2511220 0.126470323 -0.4256998 -0.28863546 0.08052089
## AAACAGGGTCTATATT-1 -3.4370591 -0.255081248 3.0760216 0.03140612 0.54291218
## UMAP_6 UMAP_7 UMAP_8 UMAP_9 UMAP_10
## AAACAAGTATCTCCCA-1 -0.08746051 -0.6257519 0.37865797 -0.03104449 -0.09756495
## AAACAATCTACTAGCA-1 -0.02234639 -0.1052133 0.07261529 0.01877433 -0.01385205
## AAACACCAATAACTGC-1 0.26400144 1.7359890 -0.99021557 0.02240689 0.11743483
## AAACAGAGCGACTCCT-1 -0.15140254 -0.9492557 0.51211873 -0.10523939 -0.20221511
## AAACAGCTTTCAGAAG-1 0.07183660 0.2142187 -0.16271552 0.10042115 0.02814349
## AAACAGGGTCTATATT-1 0.13047067 1.0395340 -0.57179739 0.02704665 0.11103763
## spatial_x spatial_y
## AAACAAGTATCTCCCA-1 3.404469 1.5934186
## AAACAATCTACTAGCA-1 -1.644489 -4.5954958
## AAACACCAATAACTGC-1 -3.779814 2.7271236
## AAACAGAGCGACTCCT-1 2.751695 -3.1261616
## AAACAGCTTTCAGAAG-1 -4.627165 0.6258883
## AAACAGGGTCTATATT-1 -4.285714 1.1517437
# clustering: see OSCA book
# note: number of clusters k
# note: use transpose
g <- buildSNNGraph(t(dims_clus), k = 10, d = ncol(dims_clus))
g_walk <- igraph::cluster_walktrap(g)
# default number of clusters (not using this for final results)
#clus <- g_walk$membership
#table(clus)
#stopifnot(length(clus) == ncol(sce))
# choose number of clusters
n_clus <- 8
clus <- igraph::cut_at(g_walk, n = n_clus)
table(clus)
## clus
## 1 2 3 4 5 6 7 8
## 1316 355 456 862 340 256 35 19
stopifnot(length(clus) == ncol(sce))
note: display plot on original spatial coordinates
d_plot <- data.frame(
# get original spatial coordinates (non-scaled)
# note: y coordinate is reversed
x_coord = colData(sce)[, c("imagecol")],
y_coord = -colData(sce)[, c("imagerow")],
cluster = as.factor(clus)
)
ggplot(d_plot, aes(x = x_coord, y = y_coord, color = cluster)) +
geom_point(size = 2, alpha = 0.5) +
coord_fixed() +
scale_color_brewer(palette = "Paired") +
theme_bw() +
ggtitle("Clustering on top few UMAP dims plus 2 spatial dims (scaled)")
filename <- "../plots/clustering_UMAP_spatial/plot_clustering_UMAP_spatial.png"
ggsave(filename, width = 6, height = 6)
Question: is there any benefit compared to clustering on UMAP components only? or using PCs only?
# clustering using UMAP components only
dims_clus2 <- dims_umap[, seq_len(n_umap), drop = FALSE]
head(dims_clus2)
## UMAP_1 UMAP_2 UMAP_3 UMAP_4 UMAP_5 UMAP_6
## [1,] 1.6574038 0.211001493 -1.6190826 -0.04793527 -0.23406501 -0.08746051
## [2,] 0.2069426 -0.002405581 -0.0430752 0.08569449 -0.13459698 -0.02234639
## [3,] -5.8309770 -0.320130656 5.0541569 0.10144563 0.87353051 0.26400144
## [4,] 1.7260043 0.288170335 -1.1166769 0.22265336 -0.25734260 -0.15140254
## [5,] -0.2511220 0.126470323 -0.4256998 -0.28863546 0.08052089 0.07183660
## [6,] -3.4370591 -0.255081248 3.0760216 0.03140612 0.54291218 0.13047067
## UMAP_7 UMAP_8 UMAP_9 UMAP_10
## [1,] -0.6257519 0.37865797 -0.03104449 -0.09756495
## [2,] -0.1052133 0.07261529 0.01877433 -0.01385205
## [3,] 1.7359890 -0.99021557 0.02240689 0.11743483
## [4,] -0.9492557 0.51211873 -0.10523939 -0.20221511
## [5,] 0.2142187 -0.16271552 0.10042115 0.02814349
## [6,] 1.0395340 -0.57179739 0.02704665 0.11103763
g <- buildSNNGraph(t(dims_clus2), k = 10, d = ncol(dims_clus2))
g_walk <- igraph::cluster_walktrap(g)
clus2 <- igraph::cut_at(g_walk, n = n_clus)
table(clus2)
## clus2
## 1 2 3 4 5 6 7 8
## 2833 272 106 89 92 144 82 21
stopifnot(length(clus2) == ncol(sce))
# clustering using top 50 PCs only
dim(dims_pcs)
## [1] 3639 50
g <- buildSNNGraph(t(dims_pcs), k = 10, d = ncol(dims_pcs))
g_walk <- igraph::cluster_walktrap(g)
clus3 <- igraph::cut_at(g_walk, n = n_clus)
table(clus3)
## clus3
## 1 2 3 4 5 6 7 8
## 166 204 684 1199 839 254 152 141
stopifnot(length(clus3) == ncol(sce))
# clustering using top few PCS plus spatial coordinates
n_pcs <- 10
# check scaling used previously to compare with UMAP coordinates still makes sense
range(dims_pcs[, 1])
## [1] -21.043732 6.036751
range(dims_pcs[, 2])
## [1] -8.142405 10.103168
range(dims_pcs[, 3])
## [1] -6.626644 7.163784
# no does not; need to rescale spatial dimensions to a more comparable scale
dims_spatial4 <- apply(as.matrix(dims_spatial), 2, function(col) {
(col - min(col)) / (max(col) - min(col)) * 40 - 20
})
range(dims_spatial4[, 1])
## [1] -20 20
range(dims_spatial4[, 2])
## [1] -20 20
# clustering
dims_clus4 <- cbind(dims_pcs[, seq_len(n_pcs), drop = FALSE], dims_spatial4)
head(dims_clus4)
## PC1 PC2 PC3 PC4 PC5
## AAACAAGTATCTCCCA-1 2.8833511 1.7285837 -0.782443502 -0.01344705 -1.20379693
## AAACAATCTACTAGCA-1 -0.6533501 -0.2294782 0.712580805 3.03055063 -0.92720998
## AAACACCAATAACTGC-1 -18.5700667 6.7446614 -2.324388224 1.60699708 -1.72073261
## AAACAGAGCGACTCCT-1 1.7054961 0.1959069 -2.507596978 -1.45367508 -1.86536411
## AAACAGCTTTCAGAAG-1 0.1991185 3.3944691 1.790684594 -0.45860635 -0.37320043
## AAACAGGGTCTATATT-1 -4.0856709 4.9539729 -0.002558349 -0.21943921 0.03340281
## PC6 PC7 PC8 PC9 PC10
## AAACAAGTATCTCCCA-1 0.6135239 0.8828088 0.5227014 0.2435614 0.8096759
## AAACAATCTACTAGCA-1 0.2433001 2.9532478 0.2841878 0.5876416 1.8839005
## AAACACCAATAACTGC-1 0.1807558 -2.2990174 0.3079935 -0.4782843 0.2362033
## AAACAGAGCGACTCCT-1 1.1122413 0.7295621 -0.1749797 -0.6134731 -2.3941236
## AAACAGCTTTCAGAAG-1 1.6694618 0.1336294 -0.4442021 -1.2576295 1.3277132
## AAACAGGGTCTATATT-1 2.6228252 -0.6490638 0.3300910 -1.3305860 0.7891792
## spatial_x spatial_y
## AAACAAGTATCTCCCA-1 13.617876 6.373674
## AAACAATCTACTAGCA-1 -6.577956 -18.381983
## AAACACCAATAACTGC-1 -15.119257 10.908495
## AAACAGAGCGACTCCT-1 11.006779 -12.504646
## AAACAGCTTTCAGAAG-1 -18.508662 2.503553
## AAACAGGGTCTATATT-1 -17.142857 4.606975
g <- buildSNNGraph(t(dims_clus4), k = 10, d = ncol(dims_clus4))
g_walk <- igraph::cluster_walktrap(g)
clus4 <- igraph::cut_at(g_walk, n = n_clus)
table(clus4)
## clus4
## 1 2 3 4 5 6 7 8
## 1109 503 749 755 211 232 31 49
stopifnot(length(clus4) == ncol(sce))
# comparison plots
d_plot2 <- data.frame(
x_coord = colData(sce)[, c("imagecol")],
y_coord = -colData(sce)[, c("imagerow")],
cluster = as.factor(clus2)
)
d_plot3 <- data.frame(
x_coord = colData(sce)[, c("imagecol")],
y_coord = -colData(sce)[, c("imagerow")],
cluster = as.factor(clus3)
)
d_plot4 <- data.frame(
x_coord = colData(sce)[, c("imagecol")],
y_coord = -colData(sce)[, c("imagerow")],
cluster = as.factor(clus4)
)
d_plot_combined <- rbind(
cbind(d_plot, method = "UMAP_spatial"),
cbind(d_plot2, method = "UMAP"),
cbind(d_plot3, method = "PCs"),
cbind(d_plot4, method = "PCs_spatial")
)
d_plot_combined$method <- factor(
d_plot_combined$method,
levels = c("UMAP_spatial", "UMAP", "PCs_spatial", "PCs")
)
head(d_plot_combined)
## x_coord y_coord cluster method
## 1 440.6391 -381.0981 3 UMAP_spatial
## 2 259.6310 -126.3276 6 UMAP_spatial
## 3 183.0783 -427.7678 5 UMAP_spatial
## 4 417.2367 -186.8137 4 UMAP_spatial
## 5 152.7003 -341.2691 1 UMAP_spatial
## 6 164.9415 -362.9163 2 UMAP_spatial
table(d_plot_combined$method)
##
## UMAP_spatial UMAP PCs_spatial PCs
## 3639 3639 3639 3639
# plots
ggplot(d_plot_combined, aes(x = x_coord, y = y_coord, color = cluster)) +
facet_wrap(~ method) +
geom_point(size = 2, alpha = 0.5) +
coord_fixed() +
scale_color_brewer(palette = "Paired") +
theme_bw() +
ggtitle("Clustering comparison: (i) UMAP plus spatial, (ii) UMAP only, (iii) PCs plus spatial, (iv) PCs only")
filename <- "../plots/clustering_UMAP_spatial/plot_clustering_UMAP_spatial_vs_UMAP_vs_PCs_spatial_vs_PCs.png"
ggsave(filename, width = 10, height = 11)